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The genus Mycobacterium is comprised of more than 150 species that reside in a wide 
variety of habitats. Most mycobacteria are environmental organisms that are either not 
associated with disease or are opportunistic pathogens that cause non-transmissible 
disease in immunocompromised individuals. In contrast, a small number of species, 
such as the tubercle bacillus, Mycobacterium tuberculosis, are host-adapted pathogens 
for which there is no known environmental reservoir. In recent years, gene disruption 
studies using the host-adapted pathogen have uncovered a number of "virulence factors," 
yet genomic data indicate that many of these elements are present in non-pathogenic 
mycobacteria. This suggests that much of the genetic make-up that enables virulence in 
the host-adapted pathogen is already present in environmental members of the genus. In 
addition to these generic factors, we hypothesize that molecules elaborated exclusively by 
professional pathogens may be particularly implicated in the ability of M. tuberculosis to 
infect, persist, and cause transmissible pathology in its host species, Homo sapiens. One 
approach to identify these molecules is to employ comparative analysis of mycobacterial 
genomes, to define evolutionary events such as horizontal gene transfer (HGT) that 
contributed M. tuberculosis-specific genetic elements. Independent studies have now 
revealed the presence of HGT genes in the M. tuberculosis genome and their role in the 
pathogenesis of disease is the subject of ongoing investigations. Here we review these 
studies, focusing on the hypothesized role played by HGT loci in the emergence of M. 
tuberculosis from a related environmental species into a highly specialized human-adapted 
pathogen. 
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INTRODUCTION 

Through modification of their genome content, bacteria can 
evolve to exploit different ecological niches. While vertical events 
such as gene duplication, chromosomal rearrangement and gene 
decay can affect the shape and structure of a genome (Ventura 
etal., 2007), horizontal gene transfer (HGT) is an important 
mechanism for bacteria to acquire novel genetic material into 
their genomes (Lercher and Pal, 2008; Price etal., 2008), sub- 
sequently facilitating adaptation and diversification (Treangen 
and Rocha, 2011). HGT can be mediated by transformation 
(acquisition of naked DNA), transduction (DNA transfer via a 
bacteriophage), and conjugation (fusion of two bacterial cells 
enabling a unidirectional transfer of a plasmid or mobile ele- 
ment; Frost etal, 2005). HGT has been shown to profoundly 
impact the prokaryotic genome plasticity, allowing the acquisi- 
tion of antibiotic resistance elements (Mohd-Zain etal., 2004; 
Palmer etal., 2010; Gray etal., 2013), virulence genes (Rosas- 
Magallanes et al, 2006; Li et al, 2012) and new metabolic pathways 
(Wilson etal, 2003; Baldwin etal, 2004; Chouikha etal, 2006; 
Noda-Garcia etal., 2013). 

Mycobacterium, a genus of Actinobacteria, comprises mostly 
non-pathogenic species. Exceptionally, this genus contains a 



number of host-adapted pathogens, including the leprosy bacillus, 
Mycobacterium leprae, and the Johne's bacillus, M. avium sub- 
species paratuberculosis, the latter defined by the presence of at 
least six genomic islands that were likely acquired by HGT (Alexan- 
der etal., 2009). In this review, we focus on the Mycobacterium 
tuberculosis complex (MTBC), agents of tuberculosis (TB) in their 
respective mammalian hosts. Among the various subspecies of the 
MTBC, M. tuberculosis sensu stricto is the cause of human TB, 
which infects over 2 billion people and causes an estimated 1.3 
million deaths annually (World Health Organization, 2013). 

When contrasting the genome of MTBC organisms with 
the most closely related environmental mycobacteria, M. mar- 
inum and M. kansasii, independent studies have identified M. 
tuberculosis-specific genetic factors putatively acquired by HGT 
(Becq etal, 2007; Veyrier etal, 2009; Supply etal., 2013), evi- 
denced by the presence of clustering, vehicles of HGT (phage, 
transposons, toxin -antitoxin genes) and an aberrant GC content 
in their DNA (Zaneveld etal., 2008). For example, the M. tuber- 
culosis genome codes for 55 proteins absent from M. kansasii, M. 
marinum and all other sequenced mycobacterial genomes (Veyrier 
etal., 2009). As 87% of these M. fufoercM/osis-specific genes are 
found in clusters, it has been postulated that these clusters may 
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FIGURE 1 | Phylogeny of M. tuberculosis and closely related 
Mycobacterium species. The un-rooted phylogenetic tree was generated by 
MEGA6.0 using 20 randomly selected genes conserved across eight 
Mycobacterium species (Schwab etal., 2009). The blue arrows schematically 
represent where putative HGT events may have occurred, resulting in M. 
tuberculosis-specific genomic islands. The scale bar indicates 0.02 
substitutions per nucleotide position, and the bootstrap values calculated 
using the neighbor-joining method (expressed as a percentage of 1000 



replicates) are shown at the branch points. The fast growing species 
M. smegmatis is used as the out-group. Genes used are listed below 
(represented as M. tuberculosis genes): Rv0001-dnaA, Rv0041-leuS, 
Hv0236A-Rv0236A, Rv0248c-Rv0248c, Rv0285-PE5, Rv0287-esxG, 
Rv0288-esxH, Rv1085c-Rvl085c, Rv0197-Rv0l97, Rv1304-atpB, 
Rv1305-atpE, Rv1894c-Rv1894c, Rv2172c-Rv2172c, Rv2392-cysH, 
Rv2440c-obg, Rv2477c-Rv2477c, Rv3019c-esxR, Rv3045-adhC, 
Rv3392c-cmaAT , Rv3502c- hsd4A. 



be pathogenicity islands that contribute to the unique virulence 
of M. tuberculosis (Hacker etal, 1997; Veyrier etal, 2009). As 
several of these M. tuberculosis-specific genes have been linked 
to host adaptation (Sassetti and Rubin, 2003; Pethe etal., 2004), 
this provides further support for the notion that HGT may have 
played a crucial role in the emergence of this pathogen. At the 
ecological level, M. tuberculosis uses humans as its sole known 
reservoir while environmental mycobacteria such as M. kansasii 
can be found in various aquatic habitats (McSwiggan and Collins, 
1974; Steadham, 1980; Sartori etal, 2013; Thomson etal, 2013), 
further highlighting the impact of genome remodeling on bacterial 
biology. 

In this review, we briefly describe the early interplay between 
M. tuberculosis and the host during an infection, followed by bioin- 
formatic data supporting the evidence for HGT and its potential 
contribution to the host-adapted lifestyle of this pathogen. To 
illustrate the relationships between various mycobacteria includ- 
ing the species discussed in this manuscript, in Figure 1 we present 
an un-rooted phylogenic tree based on 20 randomly selected 
genes conserved across eight mycobacteria, including seven slow- 
growing species (M. tuberculosis, M. canetti, M. kansasii, M. 
marinum, M. ulcerans, M. avium subsp. hotninssuis, M. avium 
subsp. paratuberculosis) and a rapid-growing species (M. smeg- 
matis) as the out-group. It is worth noting that the topology of 
this independently generated tree is congruent with the tree built 



from housekeeping genes (Veyrier etal., 2009), providing addi- 
tional support for the evolutionary relationships between these 
species. The genes used for tree generation are provided in the 
figure legend. The organization of each M. tuberculosis-specific 
locus discussed is illustrated in Figure 2, in comparison with the 
flanking genomic regions in the related organisms M. kansasii and 
M. marinum. 

M. tuberculosis AND THE HOST ENVIRONMENT 

When M. tuberculosis enters the pulmonary alveoli via the aerosol 
route, it is thought to first encounter alveolar macrophages. Fol- 
lowing phagocytosis by these macrophages, the bacterium finds 
itself in the phagosomal compartment which is, among other 
attributes, iron limiting, carbon poor, hypoxic, nitrosative, and 
oxidative (Schnappinger etal, 2003). M. tuberculosis is able to 
resist these bactericidal effects by synthesizing antioxidants, repair- 
ing DNA and proteins, maintaining intracellular pH and cell 
wall integrity (Buchmeier et al., 2000; Master et al, 2002; Boshoff 
etal, 2003; Darwin and Nathan, 2005; Vandal etal., 2008, 2009; 
Colangeli etal., 2009). Moreover, M. tuberculosis can cope with 
hypoxic and other growth-limiting environments using a number 
of tactics such as activating the dormancy regulon, promoting 
alternate metabolic pathways and iron metabolism (Leistikow 
etal, 2010; Marrero etal, 2010; Ryndak etal, 2010; Griffin etal, 
2011). In addition, M. tuberculosis is able to prevent the fusion 
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FIGURE 2 | Genomic organization of operons in M. tuberculosis, 

M. kansasii, and M. marinum. (A) Bv0986-0988\ (B) Rv3376-3378c; 
(C) Ftv3 108-3 126c; (D) Rv2954c-2961 . Protein-coding genes are 
represented by arrows and orthologous genes are indicated by arrows of 
the same color. Yellow and blue arrows mark the "boundary" of each 
M. tuberculosis-specific locus. Red arrows indicate genes discussed in 
this review. Dark green arrows indicate M. tuberculosis genes with no 
orthologs within the corresponding M. kansasii and M. marinum genomic 
regions. White arrows in M. kansasii and M. marinum genomes indicate 



genes present but not orthologs to M. tuberculosis genes. Black arrows 
indicate transposases; orange arrows indicate toxin-antitoxin genes. 
Genome organizations for M. tuberculosis, M. kansasii, M. marinum and 
gene clusters were obtained from the Kyoto Encyclopedia of Genes and 
Genomes (http://www.kegg.jp/) based on databases available at the 
Sanger Institute, Tuberculist, and McGill University, respectively. Orthologs 
were verified by comparing each predicted protein against the H37Rv 
genome using the BLAST program. Only proteins with >50% coverage, 
60% identity and E-value <e~ 20 were used. 



of phago-lysosome (Sun etal., 2010; Wong etal., 2011), perme- 
abilize the phagosomal membrane (Manzanillo etal, 2012), and 
escape into the cytosol. The cytosol likely offers a less hostile, thus 
more permissive environment, where the bacteria can replicate 
and induce the infected macrophages to undergo necrosis instead 
of apoptosis, a strategy that allows the bacteria to infect neigh- 
boring cells, thereby enabling the perpetuation of the infection 
process (van der Wei et al., 2007; Divangahi et al., 2009; Behar 
etal, 2010). 

Rv0986-0988 

In the seminal work that first described evidence of HGT in 
the M. tuberculsosis genome, Becq etal. (2007) detected a 5.6 kb 
M. tuberculosis-specific Island with a reduced GC content (53%) 
compared to the average for the M. tuberculosis genome (65.6%; 
Cole et al., 1998). Further molecular and in silico analyses demon- 
strated that this operon is present in other members of the 



MTBC, including M. bovis, M. africanum, and M. microti (Rosas- 
Magallanes etal., 2006). Based on phylogenetic analyses, it was 
proposed that three genes within this locus, Rv0986-8, had been 
acquired from phylogenetically distant y-proteobacteria via plas- 
mid transfer. The fact that the orthologs of these three genes are 
consistently together suggest that one single HGT event occurred 
during the acquisition of this operon by the ancestor of M. 
tuberculosis (Rosas-Magallanes etal., 2006). 

Rv0986 is predicted to encode an adenosine triphosphate 
(ATP) -binding protein that is orthologous to the Agrobacterium 
tumefaciens attE polypeptide, and form an ABC transporter with 
Rv0987 (Braibant etal., 2000; Rosas-Magallanes etal, 2006). The 
A. tumefaciens attE gene is located on a plasmid which harbors 
the attE-H operon. This operon has been proposed to encode 
an ABC transporter that secretes a host cell adhesion factor 
(Matthysse etal., 1996, 2000). Intriguingly, the N- and C- ter- 
minal sequences of Rv0987 share 40% similarity with attE and 
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attG, and the neighboring Rv0988 shows 50% similarity with attH 
(Rosas-Magallanes et al., 2006). 

The Rv0986-8 operon has been implicated in M. tuberculo- 
sis virulence as mutants with disruption in Rv0986 and 0987 
exhibit reduced ability to inhibit phagosome acidification in 
macrophages (Pethe etal, 2004). Furthermore, these mutants 
had impaired binding to host cells, and this phenotype could be 
rescued by complementing with a cosmid carrying M. tuberculo- 
sis DNA encompassing the Rv0986-8 operon (Rosas-Magallanes 
etal, 2007). Although Rv0986 and Rv0987 mutants were not 
shown to be attenuated in mouse lungs and spleens (Rosas- 
Magallanes etal, 2007), the Rv0986 mutant was subsequently 
shown to be less virulent in the context of central nervous system 
infection (Be etal, 2008). Recently Rv0986-8 have been found 
to be regulated by EspR, a transcription factor that also regulates 
the ESX-1 secretion system (Blasco etal, 2012), a major virulence 
mediator of M. tuberculosis (Brodin et al, 2006). 

Rv3376-Rv3378c 

Another genomic island potentially acquired by HGT is the 3.1 kb 
region encompassing Rv3376-8c (Becq et al., 2007; Veyrier et al., 
2009). This island exhibits a reduced GC content (54.7%) and 
is associated with the presence of transposases, known to mediate 
HGT events (Becq etal., 2007; Veyrier etal., 2009). The closest gen- 
era harboring such genes are Agrobacterium and Rhizobium (Becq 
etal., 2007). More recently, Mann and Peters (2012) speculated 
that, while Rv3377c amino acid sequence shares homology with 
proteins from another actinomycete, Micromonospora, Rv3378c 
has no ortholog in any other organism with the exception of a 
hypothetical protein in amoeba. These observations suggest that 
these MTBC-specific genes originated in different sources (Mann 
and Peters, 2012). 

Biochemical characterization has revelated that Rv3377c and 
Rv3378c encode a halimadienyl diphosphate (HPP) synthase and 
a diterpene synthase, respectively. In a step-wise fashion, these 
enzymes catalyze the cyclization of the precursor, geranylgeranyl 
diphosphate (GGPP), and the hydrolysis of the HPP intermediate 
to produce isotuberculosinol (isoTB), a diterpene species (Nakano 
etal., 2005; Mann etal, 2009a,b). Terpenes are one of the most 
widespread and chemically diverse compounds found in nature. 
They are hydrocarbons made up of five, or multiples of five, car- 
bon units (Simion, 2005). In plants and fungi, they are commonly 
found as essential as well as secondary metabolites involved in 
signaling and defense (Buckingham, 1994). While members of 
the Actinobcteria group synthesize a plethora of natural products 
(Baltz, 2008), very few are known to encode diterpene (C20) syn- 
thases (Dairi etal, 2001; Hamano etal, 2002; Diirr etal, 2006; 
Smanski et al. , 20 1 1 ) . 

During macrophage infection, M. tuberculosis mutants with 
disruption in either Rv3377c or Rv3378c showed marked defect 
in arresting phagosome acidification as well as intracellular sur- 
vival, suggesting that these genes are involved in the modulation 
of early infection process (Pethe etal., 2004). Intriguingly, these 
genes are not transcriptionally altered during macrophage infec- 
tion (Stewart et al, 2005; Rohde et al, 2007; Waddell and Butcher, 
2007), implying that the synthesis of isoTB is regulated at the 
protein level, potentially triggered by ambient magnesium levels 



(Mann et al., 2009a, 201 1 ). In addition, isoTB has shown to inhibit 
phagosome acidification by 0.5 pH units as well as proteolytic 
activity (Mann etal., 2009b). Recently Rv3378c has been char- 
acterized as a tuberculosinyl transferase that converts isoTB 
into the proposed end product, tuberculosinyladenosine (TbAd; 
Layre etal., 2014). The cellular mechanism by which Rv3377- 
8c modifies phagosome function thus remains to be further 
investigated. 

Rv3108-3126c 

Rv3108-3126c is a 15.1 kb, MTBC-specific genomic island that 
has a reduced GC content (56.7%) and contains genes encod- 
ing insertion sequences and a transposase, features typical of HGT 
(Becq et al., 2007). The proposed donor species include Burkholde- 
ria, Corynebacterium, and Pseudomonas. This island contains two 
potential virulence genes; Rv31 1 1 and Rv31 14. 

A transposon mutant of Rv3111 (moaCl) was first shown to 
be attenuated in replicating in macrophages (Rosas-Magallanes 
etal, 2007). In a more recent high-throughput genetic screen, 
mutants disrupted for the genes moaCl/moaDl(Rv3112), impli- 
cated in molybdenum cofactor biosynthesis, were found to be 
trafficked to acidified intracellular compartments rapidly (Brodin 
et al, 2010), potentially providing an explanation for the impaired 
intracellular growth. Molybdopterin is the main building block of 
the molybdenum cofactor (M0C0) and can be found in enzymes 
that catalyze redox reactions in carbon, nitrogen, and sulfur 
utilization (Williams etal., 2011). moeBl, a homologous gene 
potentially involved in M0C0 biosynthesis, has also been shown 
to be required for arresting phagosome acidification (MacGurn 
and Cox, 2007). moaCl mutant itself has exhibited reduced viru- 
lence in macrophages as well as primate lungs (Rosas-Magallanes 
etal., 2007; Dutta etal., 2010). Further investigation of how 
MoCo-mediated redox reactions alter the intraphagosomal envi- 
ronment should provide more insight on the cellular processes 
employed by M. tuberculosis to adapt to the mammalian host 
environment. 

Rv3114 has been shown to be required for M. tuberculosis 
persistence in the mouse spleen (Sassetti and Rubin, 2003) and 
is temporally regulated during infection (Talaat etal., 2004). It 
encodes a putative nucleoside deaminase involved in nucleotide 
metabolism (Akhter et al, 2008). 

Rv2954c-2961 

Rv2954c-2961 make-up a 7 kb genomic island with a low GC 
content (53.6%) and a transposase gene (Becq et al., 2007). A phy- 
logenetic analysis of multiple mycobacterial genome sequences 
proposed a step-wise acquisition of the genes within this locus. 
Specifically, some genes are present in slow-growing, but not 
rapid-growing mycobacteria, suggesting that they were acquired 
by the common ancestor of the slow-growing species. Conversely, 
other genes in this island are specific to M. tuberculosis and 
are therefore inferred to have been acquired after the common 
ancestor with M. kansasii and M. marinum (Veyrier etal., 2009). 

Genes within this island are involved in the synthesis and mod- 
ification of phenolic glycolipids (PGLs), complex lipids located 
in the outermost layer of the mycobacterial cell envelope. PGLs 
are composed of long-chain fatty acid backbones with a phenol 



Frontiers in Microbiology | Evolutionary and Genomic Microbiology 



April 2014 | Volume 5 | Article 139 | 4 



Wang and Behr 



HGT and M. tuberculosis 



ring and methylated sugars, including two rhamnosyl and a ter- 
minal fucosyl residue (Onwueme etal., 2005). PGLs are only 
produced by the members of the MTBC and related slow-growing 
mycobacteria, yet even among the mycobacteria that make PGL; 
there are species-specific modifications in the carbohydrate moi- 
ety [Daffe and Laneelle, 1988; Onwueme et al., 2005; schematically 
illustrated in (Veyrier et al., 20 1 1 ) ] . The PGLs have been implicated 
in mycobacterial pathogenicity such as oxidative stress resistance 
(Chan etal., 1989), cell tropism (Ng etal, 2000; Rambukkana, 
2001), and immunomodulation (Reed etal, 2004; Guenin-Mace 
et al, 2009; Cambier et al, 2014). 

The major type of PGLs produced by M. tuberculosis is denoted 
PGL-tb (Simeone etal., 2007). While most genes involved in 
the synthesis of the lipid core and carbohydrates are charac- 
terized, the enzymes responsible for O-methylating the fucosyl 
residue remained elusive until recently. It is now known that 
Rv2954c-Rv2956 code for the methyltransferases that are respon- 
sible for O-methylation of the terminal fucosyl residue (Veyrier 
etal., 2009; Simeone etal, 2013). These three proteins catalyze 
the O-methylation of the hydroxyl groups of the terminal fucosyl 
residue of PGL-tb in a sequential process (Simeone et al., 2013). 

In other pathogenic mycobacteria that lack the Rv2954c, 
Rv2955c, and Rv2956 orthologs such as M. marinum and M. 
leprae, their PGLs do not contain the terminal O-methylated fuco- 
syl residue (Onwueme etal, 2005). Although M. kansasii does 
not possess Rv2954c or 2955c, it does encode an enzyme that 
is highly similar to Rv2956 (84%), and its PGL contains four 
sugar residues (Riviere et al., 1987; Onwueme et al., 2005). Interest- 
ingly, Rv2954c and Rv2955c have been found to be virulence genes 
during macrophage infection (Rosas-Magallanes etal, 2007), and 
Rv2954c was induced upon exposure to lung surfactant (Schwab 
etal, 2009). 

As the enzymes encoded within this island have been observed 
to catalyze the transfer of functional groups from one molecule to 
another, they may play an important role at "decorating" existing 
mycobacterial products and fine-tuning host responses toward the 
organism to optimize its intracellular survival (Veyrier et al, 201 1). 

CONCLUSION 

Phylogenetic analyses have been used as robust and reliable tools 
for identifying potential HGT loci in M. tuberculosis and other 
pathogenic mycobacteria. However, the biological relevance of 
most of these genomic regions remains to be delineated. In this 
review we examine four examples of how such putative HGT 
genes can affect the physiology of the pathogen and its interac- 
tion with the host. The functional characterization of these and 
other putative HGT-associated genes will allow us to understand 
whether and how HGT events have contributed to the pathogen- 
esis of M. tuberculosis, ultimately guiding the development of new 
diagnostic tests and vaccines against this particularly successful 
pathogen. 
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